TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets

نویسندگان

Geo Pertea

Xiaoqiu Huang

Feng Liang

Valentin Antonescu

Razvan Sultana

Svetlana Karamycheva

Yuandan Lee

Joseph White

Foo Cheung

Babak Parvizi

Jennifer Tsai

John Quackenbush

چکیده

TGICL is a pipeline for analysis of large Expressed Sequence Tags (EST) and mRNA databases in which the sequences are first clustered based on pairwise sequence similarity, and then assembled by individual clusters (optionally with quality values) to produce longer, more complete consensus sequences. The system can run on multi-CPU architectures including SMP and PVM.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis

Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...

متن کامل

The TIGR Gene Indices: analysis of gene transcript sequences in highly sampled eukaryotic species

While genome sequencing projects are advancing rapidly, EST sequencing and analysis remains a primary research tool for the identification and categorization of gene sequences in a wide variety of species and an important resource for annotation of genomic sequence. The TIGR Gene Indices (http://www.tigr.org/tdb/tgi. shtml) are a collection of species-specific databases that use a highly refine...

متن کامل

The TIGR Plant Transcript Assemblies database

The TIGR Plant Transcript Assemblies (TA) database (http://plantta.tigr.org) uses expressed sequences collected from the NCBI GenBank Nucleotide database for the construction of transcript assemblies. The sequences collected include expressed sequence tags (ESTs) and full-length and partial cDNAs, but exclude computationally predicted gene sequences. The TA database includes all plant species f...

متن کامل

A Hybrid Distance Measure for Clustering Expressed Sequence Tags Originating from the Same Gene Family

BACKGROUND Clustering is a key step in the processing of Expressed Sequence Tags (ESTs). The primary goal of clustering is to put ESTs from the same transcript of a single gene into a unique cluster. Recent EST clustering algorithms mostly adopt the alignment-free distance measures, where they tend to yield acceptable clustering accuracies with reasonable computational time. Despite the fact th...

متن کامل

The TIGR Gene Indices: clustering and assembling EST and known genes and integration with eukaryotic genomes

Although the list of completed genome sequencing projects has expanded rapidly, sequencing and analysis of expressed sequence tags (ESTs) remain a primary tool for discovery of novel genes in many eukaryotes and a key element in genome annotation. The TIGR Gene Indices (http://www.tigr.org/tdb/tgi) are a collection of 77 species-specific databases that use a highly refined protocol to analyze g...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

Bioinformatics

دوره 19 5 شماره

صفحات -

تاریخ انتشار 2003

TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets

نویسندگان

چکیده

منابع مشابه

Modification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis

The TIGR Gene Indices: analysis of gene transcript sequences in highly sampled eukaryotic species

The TIGR Plant Transcript Assemblies database

A Hybrid Distance Measure for Clustering Expressed Sequence Tags Originating from the Same Gene Family

The TIGR Gene Indices: clustering and assembling EST and known genes and integration with eukaryotic genomes

عنوان ژورنال:

اشتراک گذاری